Search Results: "Peter Eisentraut"

13 June 2011

Christian Perrier: So, what happened with Kikithon?

I mentioned this briefly yesterday, but now I'll try to summarize the story of a great surprise and a big moment for me. All this started when my wife Elizabeth and my son Jean-Baptiste wanted to do something special for my 50th birthday. So, it indeed all started months ago, probably early March or something (I don't yet have all the details). Jean-Baptiste described this well on the web site, so I won't go again into details, but basically, this was about getting birthday wishes from my "free software family" in, as you might guess, as many languages as possible. Elizabeth brought the original idea and JB helped her by setting up the website and collecting e-mail addresses of people I usually work with: he grabbed addresses from PO files on Debian website, plus some in his own set of GPG signatures and here we go. And then he started poking dozens of you folks in order to get your wishes for this birthday. Gradually, contributions accumulated on the website, with many challenges for them: be sure to get as many people as possible, poking and re-poking all those FLOSS people who keep forgetting things... It seems that poking people is something that's probably in the Perrier's genes! And they were doing all this without me noticing. As usually in Debian, releasing on time is a no-no. So, it quickly turned out that having everything ready by April 2nd wouldn't be possible. So, their new goal was offering this to me on Pentecost Sunday, which was yesterday. And...here comes the gift. Aha, this looks like a photo album. Could it be a "50 years of Christian" album? But, EH, why is that pic of me, with the red Debconf5 tee-shirt (that features a world map) and a "bubulle" sign, in front of the book? But, EH EH EH, what the .... are doing these word by H0lger, then Fil, then Joey doing on the following pages? And only then, OMG, I discover the real gift they prepared. 106, often bilingual, wishes from 110 people (some were couples!). 18 postcards (one made of wood). 45 languages. One postcard with wishes from nearly every distro representatives at LinuxTag 2011. Dozens of photos from my friends all around the world. All this in a wonderful album. I can't tell what I said. Anyway, JB was shooting a video, so...we'll see. OK, I didn't cry...but it wasn't that far and emotion was really really intense. Guys, ladies, gentlemen, friends....it took me a while to realize what you contributed to. It took me the entire afternoon to realize the investment put by Elizabeth and JB (and JB's sisters support) into this. Yes, as many of you wrote, I have an awesome family and they really know how to share their love. I also have an awesome virtual family all around the world. Your words are wholeheartedly appreciated and some were indeed much much much appreciated. Of course, I'll have the book in Banja Luka so that you can see the result. I know (because JB and Elizabeth told me) that many of you were really awaiting to see how it would be received (yes, that includes you, in Germany, who I visited in early May!!!). Again, thank you so much for this incredible gift. Thank you Holger Levsen, Phil Hands, Joey Hess, Lior Kaplan, Martin Michlmayr, Alberto Gonzalez Iniesta, Kenshi "best friend" Muto, Praveen Arimbrathodiyil, Felipe Augusto van de Wiel, Ana Carolina Comandulli (5 postcards!), Stefano Zacchiroli (1st contribution received by JB, of course), Gunnar Wolf, Enriiiiiico Zini, Clytie Siddall, Frans Pop (by way of Clytie), Tenzin Dendup, Otavio Salvador, Neil McGovern, Konstantinos Margaritis, Luk Claes, Jonas Smedegaard, Pema Geyleg, Meike "sp tzle queen" Reichle, Alexander Reichle-Schmehl, Torsten Werner, "nette BSD" folks, CentOS Ralph and Brian, Fedora people, SUSE's Jan, Ubuntu's Lucia Tamara, Skolelinux' Paul, Rapha l Hertzog, Lars Wirzenius, Andrew McMillan (revenge in September!), Yasa Giridhar Appaji Nag (now I know my name in Telugu), Amaya Rodrigo, St phane Glondu, Martin Krafft, Jon "maddog" Hall (and God save the queen), Eddy Petri or, Daniel Nylander, Aiet Kolkhi, Andreas "die Katze geht in die K che, wunderbar" Tille, Paul "lets bend the elbow" Wise, Jordi "half-marathon in Banja Luka" Mallach, Steve "as ever-young as I am" Langasek, Obey Arthur Liu, YAMANE Hideki, Jaldhar H. Vyas, Vikram Vincent, Margarita "Bronx cross-country queen" Manterola, Patty Langasek, Aigars Mahinovs (finding a pic *with* you on it is tricky!), Thepittak Karoonboonyanan, Javier "nobody expects the Spanish inquisition" Fern ndez-Sanguino, Varun Hiremath, Moray Allan, David Moreno Garza, Ralf "marathon-man" Treinen, Arief S Fitrianto, Penny Leach, Adam D. Barrat, Wolfgang Martin Borgert, Christine "the mentee overtakes the mentor" Spang, Arjuna Rao Chevala, Gerfried "my best contradictor" Fuchs, Stefano Canepa, Samuel Thibault, Eloy "first samba maintainer" Par s, Josip Rodin, Daniel Kahn Gillmor, Steve McIntyre, Guntupalli Karunakar, Jano Gulja , Karolina Kali , Ben Hutchings, Matej Kova i , Khoem Sokhem, Lisandro "I have the longest name in this list" Dami n Nicanor P rez-Meyer, Amanpreet Singh Alam, H ctor Or n, Hans Nordhaugn, Ivan Mas r, Dr. Tirumurti Vasudevan, John "yes, Kansas is as flat as you can imagine" Goerzen, Jean-Baptiste "Piwet" Perrier, Elizabeth "I love you" Perrier, Peter Eisentraut, Jesus "enemy by nature" Climent, Peter Palfrader, Vasudev Kamath, Miroslav "Chicky" Ku e, Mart n Ferrari, Ollivier Robert, Jure uhalev, Yunqiang Su, Jonathan McDowell, Sampada Nakhare, Nayan Nakhare, Dirk "rendez-vous for Chicago marathon" Eddelbuettel, Elian Myftiu, Tim Retout, Giuseppe Sacco, Changwoo Ryu, Pedro Ribeoro, Miguel "oh no, not him again" Figueiredo, Ana Guerrero, Aur lien Jarno, Kumar Appaiah, Arangel Angov, Faidon Liambotis, Mehdi Dogguy, Andrew Lee, Russ Allbery, Bj rn Steensrud, Mathieu Parent, Davide Viti, Steinar H. Gunderson, Kurt Gramlich, Vanja Cvelbar, Adam Conrad, Armi Be irovi , Nattie Mayer-Hutchings, Joerg "dis shuld be REJECTed" Jaspert and Luca Capello. Let's say it gain:

5 May 2011

Peter Eisentraut: ccache and clang

Just a note for the Internet: When you use ccache and clang together, you will probably get a lot of warnings like these:
clang: warning: argument unused during compilation: '-c'
clang: warning: argument unused during compilation: '-I .'
These are harmless, but if you want to get rid of them, use the clang option -Qunused-arguments, which will hide them. (The first one is already fixed in ccache.)

The reason for this is that ccache splits the compilation into separate calls to the preprocessor and the compiler proper, and it tries to sort out which of the options that you called it with go with which call. But since gcc doesn't complain about passing -c to the preprocessor or -I to the compiler, ccache doesn't bother about sorting this out (bug). That's why you don't lose any information relative to using gcc if you use the -Qunused-arguments option.

Also, if you like clang's colored diagnostics messages, you'll have to turn them on explicitly with -fcolor-diagnostics, because when running through ccache, clang doesn't think it's printing to a terminal and turns off the color by default.

So a complete invocation might look like this:
./configure CC='ccache clang -Qunused-arguments -fcolor-diagnostics'

10 March 2011

Peter Eisentraut: My new Git pre-commit hook

This appears to be kind of useful:
#!/bin/sh

output=$(find . -name '.#*' -print)
if [ -n "$output" ]; then
echo "unsaved Emacs files:" 1>&2
echo "$output" 1>&2
exit 1
fi
Had that kind of problem a few times. :-)

Now what would be really handy are system-wide Git hooks that apply to all repositories, like ~/.gitignore complements .git/info/exclude.

6 February 2011

Peter Eisentraut: Squeeze + PostgreSQL = Broken

The PostgreSQL package in Debian squeeze, just released, is linked with libedit instead of libreadline. This has two interesting properties:
If either of these is a concern, think carefully before you upgrade.

Is there a way to at least configure libedit to accept non-ASCII characters?

16 January 2011

Peter Eisentraut: Going

I'm going to FOSDEM, the Free and Open Source Software Developers' European Meeting
me too

7 January 2011

Peter Eisentraut: Git commit mode

Hardly anything ruins a glorious day of coding like fat-fingering the commit message late at night as you doze off, and then pushing it out for the world to see. To prevent that, I have equipped my Emacs configuration with a few little tools now.

First, I found the git-commit-mode, a special mode for Git commit messages. This helps you format the commit messages according to convention, and will use ugly colors if, for example, you write lines that are too long or you do not keep the second line blank. It also allows the use of things like M-q without shredding the whole file template.

Second, I integrated on-the-fly spell checking into the git-commit-mode. It won't stop you from writing nonsense, but it will catch the silly mistakes.

Here's a simple configuration snippet:
(require 'git-commit)
(add-hook 'git-commit-mode-hook 'turn-on-flyspell)
(add-hook 'git-commit-mode-hook (lambda () (toggle-save-place 0)))
The last line is handy if you have save-place on by default. When you make a new commit, it would then normally place the cursor where a previously edited commit message was finished, because to the save-place functionality, it looks as though it's the same file.

4 November 2010

Peter Eisentraut: pipefail

It is widely considered good style to include
set -e
near the beginning of a shell script so that it aborts when there is an uncaught error. The Debian policy also recommends this.

Unfortunately, this doesn't work in pipelines. So if you have something like
some_command   sed '...'
a failure of some_command won't be recognized.

By default, the return status of a pipeline is the return status of the last command. So that would be the sed command above, which is usually not the failure candidate you're worried about. Also, the definition of set -e is to exit immediately if the return status of the last pipeline is nonzero, so the exit status of some_command isn't considered there.

Fortunately, there is a straightforward solution, which might not be very well known. Use
set -o pipefail
With pipefail, the return status of a pipeline is "the value of the last (rightmost) command to exit with a non-zero status, or zero if all commands exit successfully". So if some_command fails, the whole pipeline fails, and set -e kicks in. So you need to use set -o pipefail and set -e together to get this effect.

This only works in bash, so if you're trying to write scripts that conform to POSIX or some other standard, you can't use it. (There are usually other ways to discover failures in pipelines in other shells, but none are as simple as this one, it appears.) But if you are writing bash anyway, you should definitely use it. And if you're not using bash but use a lot of pipelines in your scripts, you should perhaps consider using bash.

(Hmm, it looks like there could be a number of latent bugs in the existing Debian package maintainer scripts, because this issue appears to be widely ignored.)

1 November 2010

Peter Eisentraut: Git User's Survey 2010 Results

The results of the Git User's Survey 2010 are up.

Not many surprises, but I can see how this sort of survey is very useful for the developers of Git.

5 October 2010

Peter Eisentraut: Git User's Survey 2010

The Git User's Survey 2010 is up. Please devote a few minutes of your time to fill out the simple questionnaire; it'll help the Git community understand your needs, what you like about Git (and what you don't), and help improve it.

The survey is open from 1 September to 15 October, 2010.

Go to https://git.wiki.kernel.org/index.php/GitSurvey2010 for more information.

3 July 2010

Peter Eisentraut: Increasing the priority of Debian experimental

Many people run Debian testing or unstable or some mix thereof. This works pretty well for a development system or a general desktop system if you know a bit about what you're doing (note: nonetheless officially not recommended). Sometimes you throw packages from experimental into the mix, if you want to get the latest stuff that isn't yet fully integrated into the rest of Debian.

The default APT priority of the Debian experimental release is 1, which ensures that it is never automatically installed or upgraded. This is not always ideal, in my experience. Of course, you don't want a package from experimental to take precedence over a package from stable, testing, or unstable by default. But I think when you have in the past installed a package from experimental, you probably want to pull in upgrades to that package from experimental as well. Otherwise, you will end up with completely unmaintained packages on your system. That is because in practice many packages in experimental are not actually experimental or broken or unmaintained, but just an advance branch of some software that is just for some reason not ready to go down the unstable-testing-stable road.

To make this work better, I have set the priority of experimental to 101 in my /etc/apt/preferences:
Package: *
Pin: release experimental
Pin-Priority: 101
Now the following will happen: If you just apt-get install a package, it will come from whatever "normal" release you have in your sources.list, say stable or testing. You can override that using -t experimental as usual. If you install a package from experimental and later an upgrade is available in experimental, apt-get upgrade will install that automatically. Also, if an upgrade in a "normal" release appears that has a higher version number, that version will be installed.

Of course, caveats apply. Some software in experimental is really experimental and should only be installed under close supervision. If a package is available only in experimental, this setup will install it when you ask for the package, even if you might not have actually wanted it if you had known that it was in experimental. Figure it out yourself. :)

Similar considerations apply to backports. I use
Package: *
Pin: release a=lenny-backports
Pin-Priority: 102
On the system I have in mind here, the standard distribution is stable, testing is 101, and backports is 102, taking precedence over testing. Because for some architecture-independent packages you don't need backports, so you can pull them directly from testing that way.

In general, the APT priority business is relatively powerful and often a good alternative to, say, manually downloading packages from various distributions, installing them manually, forgetting where they came from, and never upgrading them.

10 March 2010

Peter Eisentraut: Looking for Free Hosting

I'm looking for a way to do free hosting. But I mean free as in freedom, not free as in beer. Let me explain.

When I'm using a piece of free and open-source software such as OpenOffice.org, Evolution, or anything else, I have certain possibilities, freedoms if you will, of interacting with the software beyond just consuming it. I can look at the source code to study how it works. I can rebuild it to have a higher degree of confidence that I'm actually running that code. I can fix a bug or create an enhancement. I can send the patch upstream and wait for the next release, or in important cases I can create a local build. With the emerge of new project hosting sites such as GitHub, it's getting even easier to share one's modifications so others can use them. And so on.

As a lot of software moves to the web, how will this work in the future? There are those that say that it won't, and that it will be a big problem, and that's why you shouldn't use such services. Which is what probably a lot of free-software conscious users are doing right now. But I think that in the longer run, resisting isn't going to win over the masses to free software.

First of all, of course, the software would need to be written. So a free web office suite, a free web mail suite that matches the capabilities of the leading nonfree provider, and so on. We have good starts with Identi.ca and OpenStreetMap, for example, but we'd need a lot more. Then you throw it on a machine, and people can use it. Now as a user of this service, how do I get the source code? Of course you could offer a tarball for download, and that is the approach that the AGPL license takes. One problem with that is, if you are used to apt-get source or something similar for getting the source, everyone putting a tarball on their web site in a different place isn't going to make you happy. A standardized packaging-type thing ought to be wrapped around that. Another problem is that even if you trust the site's operator that that's the source code that's actually running on your site (even without malice, it could for example be outdated against the deployed version), it probably won't contain the local configuration files and setup scripts that would allow me to duplicate the service. And if I just want to study how the program is running in actuality, there is not much I can do.

Giving everyone SSH access to the box is probably not a good idea, and won't really solve all the issues anyway. In the future, when virtualization is standardized, ubiquitous, and awesome, one might imagine that a packaging of a web service won't be "put these files on the file system and reload a daemon" but instead "put these files and this configuration on that machine and activate it". This might give rise to a new generation of Linux "distributors". Getting the source tarball or "source package" might then involve getting a snapshot of that image, which you can examine, modify, and redeploy elsewhere. That could work for OpenStreetMap, for example, modulo the space and time required for their massive database. (But you might chose to fork only the code, not the data.) But it won't be easy to do the right thing in many cases, because with a web service, there is usually other people's data on the machine as well, which would need to be masked out or something. Maybe this really can't be done correctly, and the future will be more distributed, like in the way Jabber attempted to supplant centralized services such as ICQ. Distributed web mail makes sense, distributed OpenStreetMap perhaps less so.

Ideas anyone? Does anyone perhaps have experiences with running a web service that attempts to give users the freedoms and practical benefits that are usually associated with locally installed software?

31 January 2010

Peter Eisentraut: Going ...

I'm going to FOSDEM, the Free and Open Source Software Developers' European Meeting

See you there! Or maybe even there.

Wait ... I have the last slot on Saturday and the first slot on Sunday?!? Great! :^)

25 January 2010

Peter Eisentraut: PostgreSQL: The Universal Database Management System

I'm glad you asked, since I've been pondering this for a while. $subject is my new project slogan. Now I'm not sure whether we can actually use it, because a) it's stolen from Debian, and b) another (commercial, proprietary) database product already uses the "universal database" line.

I have come to appreciate that the "universality" of a software proposition can be a killer feature. For example, Debian GNU/Linux, the "universal operating system", might not be the operating system that is the easiest to approach or use, but once you get to know it, the fact that it works well and the same way on server, desktop, and embedded ensures that you never have to worry about what operating system to use for a particular task. Or Python, it's perhaps not the most geeky nor the most enterprisy programming language, but you can use it for servers, GUIs, scripting, system administration, like few other languages. It might as well be the "universal programming language". A lot of other software is not nearly universal, which means that whenever you move into a new area, you have to learn a lot of things from scratch and cannot easily apply and extend past experiences. And often the results are then poor and expensive.

The nice thing about PostgreSQL is that you never have to worry about whether to use it, because you can be pretty sure that it will fit the job. Even if you don't care whether something is "open source" or "most advanced". But it will fit the job. The only well-known exception is embedded databases, and frankly I think we should try to address that.

5 January 2010

Peter Eisentraut: Remove and Purge

Debian's package manager dpkg has the perhaps unique feature that it distinguishes between removing and purging a package. Removing it removes the program files but keeps the configuration files (and sometimes the logs) around, purging it really removes everything. While this distinction undoubtedly has some uses, I have found that I almost never make use of it. I think in about six years of using Debian I have actually needed a remove-but-not-purge functionality about five times, during some really tricky upgrades (and using Aptitude instead of APT might have helped, not sure) and once when I wanted to build a package that had a build dependency that conflicted with a package I had installed (cowbuilder came later).

I think many people don't fully realize this distinction, and thus aged systems will often contain dozens or hundreds of removed-but-not-purged packages lying around. Great fun cleaning that up. And therefore, at some point in the distant past I have switched all my APTs to purge by default, using the configuration setting Apt::Get::Purge "true";. At the time I thought this would be daring, but I have never looked back. The one time a year that I don't want to purge I override this by hand.

Later, APT actually got an apt-get purge command, but there is no apt-get autopurge and no apt-get dist-upgrade-and-purge (or whatever) to purge the packages it wants to remove. This can be worked around by carefully adding --purge to all invocations of apt-get, but who will remember that. And of course apt-get remove is hardwired into my fingers.

How do other people handle this? Are there undiscovered reasons removing is the better default? How do you clean up packages that were forgotten to be purged?

23 December 2009

Peter Eisentraut: Patience

Occasionally, there are concerns expressed about the adoption rate of Python 3. Now that PostgreSQL 8.5alpha3 is released with Python 3 support in PL/Python, let's see what the schedule might be until this hits real use.

Python 3.0 was released in December 2008 and was admitted to be somewhat experimental. At that point, PostgreSQL 8.4 was already in some kind of freeze, so adding a feature as signicant as Python 3 support was not feasible at that point.

In fact, we opted to do two significant rounds of fixing/enhancing/refactoring of PL/Python before tackling Python 3 support: fixed byte string (bytea) support and Unicode support. Both of those benefit Python 2 users, and they made the eventual port to Python 3 quite simple.

PostgreSQL 8.5 might release around May 2010. Debian squeeze is currently in testing and doesn't even contain Python 2.6 or Python 3.1 yet, due to some technical problems and a bit of infighting. Debian freezes in March 2010, which means without PostgreSQL 8.5 but hopefully with Python 2.6 and 3.1. The final release of squeeze should then be later in 2010, which means the earliest that a significant number of Debian users are going to look into moving any of their code at all nearer to Python 3 (via 2.6) is going to be late 2010. (Other operating system distributions will have other schedules, of course.)

The next Debian release (squeeze+1), which will presumably include PostgreSQL 8.5 or later and a solid Python 3.x environment, will then not be released before January 2012, if we believe the current 18-months-plus-slip cycle of Debian. So it will be mid-2012 until significant numbers have upgraded Debian and PostgreSQL to the then-current versions. If you are sticking to a stable and supported operating system environment, this is the earliest time you actually have the option to migrate to the Python 3 variant of PL/Python across the board for your applications. Of course in practice this is not going to be the first thing you are going to do, so by the time you actually port everything, it might be late 2012 or even 2013. Also, if you are heavily invested in PL/Python, you are probably not going to upgrade much your other Python code before PL/Python is ready.

This will then be 3 years after the PL/Python 3 code is written and 4 years after the release of Python 3.0. And 2 years before Python 2.x is expected to go out of maintenance in 2015.

So, to all users and developers: patience.

Incidentally, I fully expect to still be using IPv4 by then. ;-)

29 October 2009

Peter Eisentraut: A History of Tarballs

I have been maintaining the autoconfigury of PostgreSQL for many years now, and every once in a while I go to ftp://ftp.gnu.org/gnu/autoconf/ to check out a new version of Autoconf. That FTP listing is actually an interesting tale of how tarball creation practices have evolved over the years.

Obviously, .tar.gz has been the standard all along. Some projects have now completely abandoned .tar.gz in favor of .tar.bz2, but those are rare. I think most ship both now. The FTP listing goes back to 1996; the first .tar.bz2 was shipped in 2001.

RPM-based distributions have switched to supporting and then requiring bzip2-compressed tarballs many years ago. Debian might start supporting that with the next release. So if you want to be able to trace your pristine tarballs throughout the popular Linux distributions, shipping both is best.

One thing that was really popular back then but is almost forgotten now is providing patches between versions, like autoconf-2.12-2.13.diff.gz. The Linux kernel still does that. Autoconf stopped doing that in 1999, when it was replaced by xdelta. Anyone remember that? This lasted until 2002 and was briefly revived in 2008. I think shipping xdeltas is also obsolete now except possibly for huge projects.

In 2003, they started signing releases. First with ASCII-armored signatures (.asc), now with binary signatures (.sig). The Linux kernel also does this, except they call the ASCII-armored signatures .sign.

In 2008, we saw the latest invention, LZMA-compressed tarballs (.tar.lzma). They appear to compress better than bzip2 by about as much as bzip2 wins over gzip. But, this one's already obsolete because it was replaced in 2009 by LZMA2, which goes by the file extension .tar.xz. Some "early adopters" such as Debian's packaging tool dpkg are in the process of adding xz support in addition to the short-lived lzma support.

Throughout all this, interestingly, tar hasn't changed a bit. Well, there are various incompatible extended tar formats around, but when this becomes a problem, people tend to revert to GNU tar.

GNU tar, by the way, supports all the above compression formats internally. gzip is -z, bzip2 is -j, lzma is, well, --lzma, and xz is -J. And Automake supports creating all these different formats for source code distributions.

13 May 2009

Lucas Nussbaum: UDD and packages metrics

Peter Eisentraut played with Ultimate Debian Database, and wanted to create a maintenance effort metric by multiplying each package s installed size by its popcon. His query is:
SELECT rank() OVER (ORDER BY score DESC), source,
sum(installed_size::numeric * insts) AS score
FROM packages JOIN popcon USING (package)
WHERE distribution = 'debian' AND release = 'sid'
AND component = 'main' AND architecture IN ('all', 'i386')
GROUP BY source, version ORDER BY score DESC LIMIT 30;
Besides all the interesting things that I learnt by looking at his query (rank(), and a bug in UDD because installed_size should really be numeric to avoid the conversion), Peter had a problem with his query: linux-2.6 is missing from the results, while it should obviously have a large popcon and a large install size. The problem is that the binary packages for linux-2.6 often change, so they don t get very high in popcon. The unstable kernel package gets a ridiculous popcon score: select package, insts from popcon
where package in (select package from packages where source ='linux-2.6' and release='sid')
order by insts desc limit 30;
package                             insts
----------------------------------+-------
linux-libc-dev                      38703
linux-source-2.6.29                   614
linux-headers-2.6.29-2-common         256
linux-image-2.6.29-2-amd64            239
A solution could be to change the metric to be: MAX(insts over all binary packages from this source package) * SUM(installed_size)
The good thing is that UDD already offers a popcon_src view, that gives the popcon score for a source package. So the query becomes:
SELECT rank() OVER (ORDER BY score DESC), source,
sum(installed_size::numeric * insts) AS score
FROM packages JOIN popcon_src USING (source)
WHERE distribution = 'debian' AND release = 'sid'
AND component = 'main' AND architecture IN ('all', 'i386')
GROUP BY source ORDER BY score DESC LIMIT 30;
 rank       source         score
------+---------------+-------------
1   openoffice.org      92177633504
2   qt4-x11             18503941620
3   linux-2.6           16036201020
4   gcc-4.3             14369300376
5   mesa                12962475968
6   eglibc              12581290240
7   gcc-4.4             11411296672
8   samba               10021083072
9   xulrunner            9037295424
10   mysql-dfsg-5.0      8348333532
This time, linux-2.6 shows up near the top of the list.

Peter Eisentraut: The Big Shots

As the occasional thinker about open-source development practices, communities, and issues, I have been wondering for a while: What are the largest open-source projects? What projects have the most code, the most users, and the most issues to deal with, and how do they cope?

The Debian archive should provide some insights into the first one or two questions, as it contains a very large portion of all available and relevant open-source software and exposes them in a fairly standard form. In the old days one might have gotten out grep-dctrl to create some puzzling statistics, but nowadays this information is actually available in an SQL database: the Ultimate Debian Database (UDD). (And it's in PostgreSQL. And it comes with a postgresql_autodoc-generated schema documentation. Excellent.)

So here is a first question. Well, the zeroth question would have been, which source packages have the largest unpacked orig tarball, but that information doesn't seem to be available, either via UDD or via apt. So the first question anyway is, which source packages produce the largest installation size across all their binary packages:
udd=> SELECT source, sum(installed_size)/1024 AS mib FROM packages WHERE distribution = 'debian' AND release = 'sid' AND component = 'main' AND architecture IN ('all', 'i386') AND section 
 'debug' GROUP BY source, version ORDER BY mib DESC LIMIT 30;                                                                                   
source mib
------------------+------
openoffice.org 1797
kde-l10n 648
gcj-4.4 544
vtk 465
linux-2.6 404
openclipart 353
vegastrike-data 311
ghc6 308
gclcvs 303
wesnoth 300
fpc 269
axiom 256
webkit 255
gcc-snapshot 255
lazarus 241
kdebase-workspace 226
plt-scheme 221
torcs-data-tracks 219
scilab 213
openscenegraph 211
eclipse 210
sagemath 201
insighttoolkit 198
acl2 195
kdebindings 181
atlas 165
gcl 163
trilinos 153
paraview 153
asterisk 144
(30 rows)
This produces a few well-known packages, but also a number of obscure ones. If you look closer, many of them appear to be themed around scientific, numerical, visualization, Scheme, Lisp, that sort of thing. Hmm.

Here is another idea. Take a package's installation footprint and multiply it by its popularity contest installation count. So you get some kind of maintenance effort score, either because the package is large or because you have a lot of users or both.
SELECT rank() OVER (ORDER BY score DESC), source, sum(installed_size::numeric * insts) AS score FROM packages JOIN popcon USING (package) WHERE distribution = 'debian' AND release = 'sid' AND component = 'main' AND architecture IN ('all', 'i386') GROUP BY source, version ORDER BY score DESC LIMIT 30;
rank source score
-----+-----------------------------+-------------
1 openoffice.org 12638492332
2 mysql-dfsg-5.0 3411344560
3 eglibc 3371485240
4 perl 3019183024
5 evolution 2669948000
6 samba 2308923872
7 mesa 1853902860
8 texlive-base 1684245516
9 gcj-4.3 1610495484
10 foomatic-db-engine 1608178104
11 foomatic-db 1423947704
12 inkscape 1413910080
13 qt4-x11 1258220636
14 gcc-4.3 1248741312
15 kdelibs 1021058256
16 gnome-applets 998434136
17 xulrunner 958232688
18 coreutils 954766896
19 openssl 877067672
20 ncurses 827679424
21 python2.5 815826384
22 aptitude 808161380
23 gimp 786015124
24 gnome-utils 781756328
25 nautilus 774319690
26 openoffice.org-dictionaries 761075576
27 eclipse 756072380
28 dpkg 736626200
29 openclipart 731244240
30 wine 707967500
(30 rows)
(Yeah, they run this thing on PostgreSQL 8.4 beta 1.)

I noticed linux-2.6 is suspiciously absent because of a low popcon score (?!?).

I don't want to dump the entire database into this blog post, but if you try this yourself you can look at about the first 200 to 300 places to find reasonably large and complex projects before it gets a bit more obscure. A few highlights:
  51   gnupg                           455660464
59 php5 386417572
60 mutt 381148176
83 icu 258602756
84 xorg-server 255186332
101 exim4 224857700
107 openssh 215792828
113 tar 201520400
114 postgresql-8.3 196844584
115 libx11 195856564
116 ruby1.8 194681656
272 emacs22 62047476
This is obviously still biased in a lot of ways, but it does show the major projects.

The UDD is also an interesting use case that shows how you can deploy a PostgreSQL database as a semi-public service with direct access. A great tool, and a great tool to build other great tools on top of.

16 April 2009

Peter Eisentraut: Web browsers vs. debtags

So I wanted to see what web browsers are available in Debian. The first stop was http://packages.debian.org/. Going to the page of one browser package and clicking on "Browser" in the tags area only gives you the explanation of the tag, but not the list of other packages with that tag. Is that available somewhere?

So next try maybe grep-dctrl ... oh, grep-debtags appears to be the ticket.

$ grep-debtags -n -sPackage web::browser
arora
browser-history
caudium-dev
caudium-modules
caudium-perl
caudium-pixsl
caudium-ultralog
chimera2
conkeror
cookietool
dhelp
edbrowse
elinks
elinks-lite
elvis
elvis-console
epiphany-browser
epiphany-browser-dev
epiphany-extensions
epiphany-gecko
ezmlm-browse
galeon
galeon-common
gtkcookie
iceweasel
jsmath
junior-internet
kazehakase
konq-plugins
links
links2
lynx
lynx-cur
lynx-cur-wrapper
mozilla-firefox-adblock
mozilla-noscript
netrik
netsurf
saods9
stripclub
sugar-web-activity
surfraw
w3m
w3m-el
w3m-img
wapua
claws-mail-dillo-viewer
dillo
konqueror
midori

How many of those are actually web browsers? Probably about half of them. (Example: caudium (not listed) is a web server, caudium-dev is its development package, not very close to a web browser.)

This would actually be quite a useful interface if the tags had any relationship to reality. I was in fact looking for a lightweight graphical browser, so this is a plausible command:

$ grep-debtags -n -d -sPackage web::browser -a interface::x11 -a -! suite::gnome -a -! suite::kde

which gives me 15 hits, of which 8 or 9 are actual web browsers.

Well, my search for a lightweight browser stopped here:

iceweasel
lightweight web browser based on Mozilla

Yeah! ;-)

3 February 2009

Peter Eisentraut: Debian PostgreSQL Packaging Project

We have launched the Debian PostgreSQL Packaging Project at http://pkg-postgresql.alioth.debian.org/. We are a group of people interested in maintaining Debian packages related to PostgreSQL.

Obviously, Debian already contains a large number of packages related to PostgreSQL. The idea behind this project is to get the involved maintainers together, concentrate resources, exchange ideas, and allow more people to get involved in small ways. See our web site about participating.

Next.

Previous.